Morphological Analysis of the Spontaneous Speech Corpus

نویسندگان

  • Kiyotaka Uchimoto
  • Chikashi Nobata
  • Atsushi Yamada
  • Satoshi Sekine
  • Hitoshi Isahara
چکیده

This paper describes a project tagging a spontaneous speech corpus with morphological information such as word segmentation and parts-ofspeech. We use a morphological analysis system based on a maximum entropy model, which is independent of the domain of corpora. In this paper we show the tagging accuracy achieved by using the model and discuss problems in tagging the spontaneous speech corpus. We also show that a dictionary developed for a corpus on a certain domain is helpful for improving accuracy in analyzing a corpus on another domain.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مقایسه ویژگی‌های صرفی– نحوی گفتار بیماران ناروان بی دستور با افراد سالم فارسی زبان

Background and purpose: The main features of non-fluent aphasia are inadequate production, limited vocabulary and agrammatism. Such patients have deficits in sentence comprehension and production and their speech is short and telegraphic. In this study, morphological and syntactic errors in speech of non-fluent aphasia were compared with those in healthy subjects. Materials and methods: A ...

متن کامل

Morphological Analysis of a Large Spontaneous Speech Corpus in Japanese

This paper describes two methods for detecting word segments and their morphological information in a Japanese spontaneous speech corpus, and describes how to tag a large spontaneous speech corpus accurately by using the two methods. The first method is used to detect any type of word segments. The second method is used when there are several definitions for word segments and their POS categori...

متن کامل

Morphological Annotation of a Large Spontaneous Speech Corpus in Japanese

We propose an efficient framework for humanaided morphological annotation of a large spontaneous speech corpus such as the Corpus of Spontaneous Japanese. In this framework, even when word units have several definitions in a given corpus, and not all words are found in a dictionary or in a training corpus, we can morphologically analyze the given corpus with high accuracy and low labor costs by...

متن کامل

Progress of Speech Recognition using the Corpus of Spontaneous Japanese (CSJ)

The report gives an overview of the current state of spontaneous speech recognition using the “Corpus of Spontaneous Japanese (CSJ)”. It is shown that the large-scale corpus had strong impact in training acoustic and language models considering morphological and pronunciation variations which are characteristic to spontaneous Japanese. Unsupervised adaptation of these models and the speaking ra...

متن کامل

Automatic Speech Transcription and Archiving System using the Corpus of Spontaneous Japanese

The target of automatic speech recognition (ASR) research has been shifted from read speech to spontaneous speech. The technology will realize automatic transcription (and translation) of lectures and meetings. In Japan, ”Spontaneous Speech” project has been conducted in last five years, and we set up the huge ”Corpus of Spontaneous Japanese (CSJ)”, which consists of over 2000 speeches (500 hou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002